heat dispersion
A Proof of Theorem
A.1 Proof Sketch We first introduce the following lemma: Lemma 1. In general, it is hard to develop a convergence rate for objective values. By Theorem 5, we can also show the superiority of FedSubAvg over FedAvg. We then assume that FedSubAvg always activates all the clients at the beginning of each communication round and then uses the parameters maintained by a few selected clients to generate the next-round parameter. It is clear that this update scheme is equivalent to the original.
- North America > United States > Virginia (0.04)
- North America > United States > Texas (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
- Information Technology > Communications > Social Media (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
A Proof of Theorem 1, A2, B1, B
A.1 Proof Sketch We first introduce the following lemma: Lemma 1. We first consider the condition number of Ĥ when X is in a locally convex area. In general, it is hard to develop a convergence rate for objective values. However, when the global model is in a locally convex area of f, we can obtain the relationship between the gradient and the local optimum. Theorem 4. When there is no parameter heat dispersion, and X is in a µ-strongly convex area of f We note that there is a difference between equation 18 and 21: for each client i, equation 18 involves all the parameters of the full model while equation 21 involves only partial parameters of the submodel, which causes a change in the lower bound of T (Y) and further leads to a change of conclusion.
Federated Submodel Optimization for Hot and Cold Data Features Yucheng Ding 1 Fan Wu1 Shaojie Tang
We focus on federated learning in practical recommender systems and natural language processing scenarios. The global model for federated optimization typically contains a large and sparse embedding layer, while each client's local data tend to interact with part of features, updating only a small submodel with the feature-related embedding vectors. We identify a new and important issue that distinct data features normally involve different numbers of clients, generating the differentiation of hot and cold features. We further reveal that the classical federated averaging algorithm (FedAvg) or its variants, which randomly selects clients to participate and uniformly averages their submodel updates, will be severely slowed down, because different parameters of the global model are optimized at different speeds. More specifically, the model parameters related to hot (resp., cold) features will be updated quickly (resp., slowly).
- North America > United States > Virginia (0.04)
- North America > United States > Texas (0.04)
- Asia > China > Shanghai > Shanghai (0.04)